Blip Image Captioning Large Mocha
MIT
This is the official fine-tuned version of the BLIP-Large model, optimized using the MOCHa reinforcement learning framework on the MS-COCO dataset to mitigate open-vocabulary description hallucination issues
Image-to-Text
Transformers